list(
c(TRUE, FALSE),
c(1.41, 5.45),
c(1L, 2L),
c("banana", "apple")
) %>%
map(typeof)[[1]]
[1] "logical"
[[2]]
[1] "double"
[[3]]
[1] "integer"
[[4]]
[1] "character"
What are the four common types of atomic vectors? What are the two rare types?
The 4 common types are logical, double, integer, and character
list(
c(TRUE, FALSE),
c(1.41, 5.45),
c(1L, 2L),
c("banana", "apple")
) %>%
map(typeof)[[1]]
[1] "logical"
[[2]]
[1] "double"
[[3]]
[1] "integer"
[[4]]
[1] "character"
The 2 rare types are complex (depicting complex numbers) and raw (binary data, displayed as hex values).
# Build example complex vector
vec_complex <- complex(2)
vec_complex[[1]] <- 1
vec_complex[[2]] <- 2 + 3i
# Profile it
## Call vector directly to see complex data
vec_complex[1] 1+0i 2+3i
## Check type
vec_complex %>%
typeof()[1] "complex"
# Build example raw vector
vec_raw <- raw(2)
vec_raw[[1]] <- as.raw(15)
vec_raw[[2]] <- charToRaw("P")
# Profile it
## Call vector directly to see hex data
vec_raw[1] 0f 50
## Check type
vec_raw %>%
typeof()[1] "raw"
## See stored data as decimal integers
vec_raw %>%
as.integer()[1] 15 80
## Convert hex 0x50 to character
vec_raw[[2]] %>%
rawToChar()[1] "P"
What are attributes? How do you get them and set them?
Attributes are metadata attached to atomic vectors to create more complex data structures from them. For example, dim turns a vector into a matrix, names gives each vector elements a name.
For example:
x <- c(1:6)
# Turn above vector into matrix (note vectors are created column-wise)
dim(x) <- c(2, 3)
x [,1] [,2] [,3]
[1,] 1 3 5
[2,] 2 4 6
How is a list different from an atomic vector? How is a matrix different from a data frame?
A list is a vector of references to objects. These objects can be vectors themselves. Lists don’t contain actual “data”. Whereas vectors contain data.
A matrix is an atomic vector with a dim attribute. A data frame is a list of atomic vectors. All data in a matrix must be the same type, where each column in a data frame may have a different type.
Can you have a list that is a matrix? Can a data frame have a column that is a matrix?
List that is a matrix: By assigning dimensions to a list:
x <- list(
c(1, 2),
c(3, 4),
c(5, 6),
c(7, 8)
)
dim(x) <- c(2, 2)
str(x)List of 4
$ : num [1:2] 1 2
$ : num [1:2] 3 4
$ : num [1:2] 5 6
$ : num [1:2] 7 8
- attr(*, "dim")= int [1:2] 2 2
tree(x)<list>
├─<dbl [2]>1, 2
├─<dbl [2]>3, 4
├─<dbl [2]>5, 6
└─<dbl [2]>7, 8
x [,1] [,2]
[1,] numeric,2 numeric,2
[2,] numeric,2 numeric,2
Matrix a column of a dataframe: By assigning (making sure there are enough rows):
x <- data.frame(
names = c("a", "b", "c")
)
x$val <- matrix(c(1:15), nrow = 3, ncol = 5)
str(x)'data.frame': 3 obs. of 2 variables:
$ names: chr "a" "b" "c"
$ val : int [1:3, 1:5] 1 2 3 4 5 6 7 8 9 10 ...
# Note that rendering of this dataframe is inconsistent
print(x) names val.1 val.2 val.3 val.4 val.5
1 a 1 4 7 10 13
2 b 2 5 8 11 14
3 c 3 6 9 12 15
reactable(x)datatable(x)gt(x)Warning in body[[colname]][row_index] <- process_text(text = vals, context =
context): number of items to replace is not a multiple of replacement length
| names | val |
|---|---|
| a | 1 |
| b | 2 |
| c | 3 |
How do tibbles behave differently from data frames?
They don’t coerce strings to factors, print more nicely, and has more robust subsetting.
How do you create raw and complex scalars? (See ?raw and ?complex.)
Many ways to create raw vectors:
# Create vector containing raw data
x <- c(as.raw(40), as.raw(50))
x[1] 28 32
typeof(x)[1] "raw"
# Create vector, then coerce it to raw
x <- c(40, 50) %>%
as.raw()
x[1] 28 32
typeof(x)[1] "raw"
# Create empty raw vector then fill slots
x <- raw(2)
x[[1]] <- as.raw(40)
x[[2]] <- as.raw(50)
x[1] 28 32
typeof(x)[1] "raw"
Many ways to create complex vectors
# Create vector containing complex data
x <- c(1 + 6i, 1.7 + 3.2i)
x[1] 1.0+6.0i 1.7+3.2i
typeof(x)[1] "complex"
# Create vector, then coerce it to complex
## Note: no imaginary part, initially
x <- c(1, 1.7)
x[1] 1.0 1.7
typeof(x)[1] "double"
x <- x %>%
as.complex()
x[1] 1.0+0i 1.7+0i
typeof(x)[1] "complex"
# Create empty raw vector then fill slots
x <- complex(2)
x[[1]] <- 1 + 6i
x[[2]] <- 1.7 + 3.2i
x[1] 1.0+6.0i 1.7+3.2i
typeof(x)[1] "complex"
Test your knowledge of the vector coercion rules by predicting the output of the following uses of c():
c(1, FALSE)
c("a", 1)
c(TRUE, 1L)# Coerced to double
c(1, FALSE) %>%
typeof()[1] "double"
# Coerced to character
c("a", 1) %>%
typeof()[1] "character"
# Coerced to integer
c(TRUE, 1L) %>%
typeof()[1] "integer"
Why is 1 == "1" true? Why is -1 < FALSE true? Why is "one" < 2 false?
Because both sides of the argument are coerced (in the order of logical - numeric - character) prior to comparison.
# First comparison: Both sides coerced to character.
c(1, "1") %>%
typeof()[1] "character"
# Second comparison: Both sides coerced to numeric
c(-1, FALSE) %>%
typeof()[1] "double"
c(-1, FALSE)[1] -1 0
# Third comparison: Both sides coerced to character, comparison not possible
c("one", 2) %>%
typeof()[1] "character"
c("one", 2)[1] "one" "2"
Why is the default missing value, NA, a logical vector? What’s special about logical vectors? (Hint: think about c(FALSE, NA_character_).)
NA is automatically set to the strictest type possible, allowing it to be coerced following the rest of the vector. Where coercion is not needed, it defaults to the strictest type of logical.
typeof(NA)[1] "logical"
typeof(NA_integer_)[1] "integer"
Precisely what do is.atomic(), is.numeric(), and is.vector() test for?
is.atomic() checks whether the object is an atomic vector (falling into 6 defined classes and direct extensions)is.numeric()) checks whether the data is intepretable as numbers (base type double or integer)is.vector() checks whether the provided vector (includes list) have no attributes other than names. Since a matrix is a vector with the dim attribute, it would fail this test.# Test a few scenarios
list(
list = list(),
matrix = matrix(
c(1:12),
nrow = 3,
ncol = 4
),
logical = c(TRUE, FALSE),
integer = c(4L, 6L),
double = c(2.4, 5.2),
character = c("cook", "no"),
factor = c("red", "blue")
) %>%
map(\(x) {
c(
is.atomic(x),
is.numeric(x),
is.vector(x)
)
})$list
[1] FALSE FALSE TRUE
$matrix
[1] TRUE TRUE FALSE
$logical
[1] TRUE FALSE TRUE
$integer
[1] TRUE TRUE TRUE
$double
[1] TRUE TRUE TRUE
$character
[1] TRUE FALSE TRUE
$factor
[1] TRUE FALSE TRUE
How is setNames()implemented? How is unname() implemented? Read the source code.
setNames is simply an inline version of the names attribute setter function.
setNamesfunction (object = nm, nm)
{
names(object) <- nm
object
}
<bytecode: 0x0000008020c7ebd0>
<environment: namespace:stats>
unname() is more complicated. - If the object has a names attribute, set it to NULL - If the object has a dimnames attribute, is a dataframe, and force is TRUE set it to NULL
unnamefunction (obj, force = FALSE)
{
if (!is.null(names(obj)))
names(obj) <- NULL
if (!is.null(dimnames(obj)) && (force || !is.data.frame(obj)))
dimnames(obj) <- NULL
obj
}
<bytecode: 0x000000801a6d4660>
<environment: namespace:base>
What does dim() return when applied to a 1-dimensional vector? When might you use NROW() or NCOL()?
dim() returns NULL when applied to a 1-dimensional vector.
x <- c(1, 2, 3)
dim(x)NULL
NROW() and NCOL() are applied to objects feasibly treated as 2-dimensional, such as vectors, matrix, dataframe. Difference between nrow() and NROW() is that NROW() treats vectors as a matrix with one column.
The length of the 1st dimension is the number of rows, the length of the 2nd dimension is the number of columns.
x <- list(
vector = c(1:18),
matrix = matrix(1:18, nrow = 6, ncol = 3),
array = array(1:18, dim = c(2, 3, 3))
)
# Test with nrow, ncol
map(x, \(x) c(nrow(x), ncol(x)))$vector
NULL
$matrix
[1] 6 3
$array
[1] 2 3
# Test with NROW, NCOL
map(x, \(x) c(NROW(x), NCOL(x)))$vector
[1] 18 1
$matrix
[1] 6 3
$array
[1] 2 3
How would you describe the following three objects? What makes them different from 1:5?
x1 <- array(1:5, c(1, 1, 5))
x2 <- array(1:5, c(1, 5, 1))
x3 <- array(1:5, c(5, 1, 1))
x4 <- c(1:5)x1 is a 3D array with 1 row, 1 column, and 5 third-dimensional lengthx2 is a 3D array with 1 row, 5 columns, and 1 third-dimensional lengthx3 is a 3D array with 5 row, 1 columns, and 1 third-dimensional lengthThey are 3-dimensional objects with a dim attribute, whereas 1:5 does not.
x <- list(
x1,
x2,
x3,
x4
)
map(x, \(x) dim(x))[[1]]
[1] 1 1 5
[[2]]
[1] 1 5 1
[[3]]
[1] 5 1 1
[[4]]
NULL
map(x, \(x) str(x)) int [1, 1, 1:5] 1 2 3 4 5
int [1, 1:5, 1] 1 2 3 4 5
int [1:5, 1, 1] 1 2 3 4 5
int [1:5] 1 2 3 4 5
[[1]]
NULL
[[2]]
NULL
[[3]]
NULL
[[4]]
NULL
An early draft used this code to illustrate structure():
structure(1:5, comment = "my attribute")[1] 1 2 3 4 5
But when you print that object you don’t see the comment attribute. Why? Is the attribute missing, or is there something else special about it?
By default, the comment attribute is not printed. See ?comment
# Create test object
x <- structure(1:5, comment = "my attribute")
# Check type
typeof(x)[1] "integer"
# Check attributes is assigned
str(attributes(x))List of 1
$ comment: chr "my attribute"
# Print
print(x)[1] 1 2 3 4 5
What sort of object does table() return? What is its type? What attributes does it have? How does the dimensionality change as you tabulate more variables?
table() returns a contingency table (technically an array) detailing the counts of each combinations of a factor or vector. Number of array dimensions increase as the number of variables in the contingency table increases.
# Test with mtcars dataset
mtcars %>%
reactable(defaultPageSize = 4)# Create list of all contingency tables
vars <- list(
c("cyl"),
c("cyl", "am"),
c("cyl", "am", "gear"),
c("cyl", "am", "gear", "carb")
)
contingency_tables <- vars %>%
map(\(vars) {
mtcars %>%
dplyr::select(all_of(vars)) %>%
table()
})
# Show dimension of all cases
map(
contingency_tables,
\(x) dim(x)
)[[1]]
[1] 3
[[2]]
[1] 3 2
[[3]]
[1] 3 2 3
[[4]]
[1] 3 2 3 6
What happens to a factor when you modify its levels?
f1 <- factor(letters)
levels(f1) <- rev(levels(f1))The underlying integer vector is unchanged, but the reversed levels attribute cause the factor to be reversed. In effect, the data is changed. So don’t use rev() for this purpose…
# Profile original factor
f1 <- factor(letters)
f1 [1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
as.integer(f1) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
# Profile reversed factor
levels(f1) <- rev(levels(f1))
f1 [1] z y x w v u t s r q p o n m l k j i h g f e d c b a
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
as.integer(f1) [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
What does this code do? How do f2 and f3 differ from f1?
f2 <- rev(factor(letters))
f3 <- factor(letters, levels = rev(letters))f1: The underlying integer vector is maintained, but the factors are reversed, leading to modification of data.
f2: The underlying integer vector is reversed, the factors are not. This represents a modification of the data.
f3: The underlying integer vector is reversed, the factors are also reversed. Hence, the data is (in effect) unchanged.
# Create a completely unchanged factor for reference
f0 <- factor(letters)
factors <- list(
f0 = f0,
f1 = f1,
f2 = f2,
f3 = f3
)
# f1 and f2 are effectively reversed
factors$f0
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
$f1
[1] z y x w v u t s r q p o n m l k j i h g f e d c b a
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
$f2
[1] z y x w v u t s r q p o n m l k j i h g f e d c b a
Levels: a b c d e f g h i j k l m n o p q r s t u v w x y z
$f3
[1] a b c d e f g h i j k l m n o p q r s t u v w x y z
Levels: z y x w v u t s r q p o n m l k j i h g f e d c b a
# f2 and f3 are reversed in the integer vector
factors %>%
map(as.integer)$f0
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
$f1
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25
[26] 26
$f2
[1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
[26] 1
$f3
[1] 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2
[26] 1
# f1 and f3 has levels reversed
factors %>%
map(levels)$f0
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
$f1
[1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" "i" "h"
[20] "g" "f" "e" "d" "c" "b" "a"
$f2
[1] "a" "b" "c" "d" "e" "f" "g" "h" "i" "j" "k" "l" "m" "n" "o" "p" "q" "r" "s"
[20] "t" "u" "v" "w" "x" "y" "z"
$f3
[1] "z" "y" "x" "w" "v" "u" "t" "s" "r" "q" "p" "o" "n" "m" "l" "k" "j" "i" "h"
[20] "g" "f" "e" "d" "c" "b" "a"
List all the ways that a list differs from an atomic vector.
Why do you need to use unlist() to convert a list to an atomic vector? Why doesn’t as.vector() work?
A list can contain lists or other objects that do not fit inside an atomic vector. unlist() has extra logic to handle this, as.vector() doesn’t.
Compare and contrast c() and unlist() when combining a date and date-time into a single vector.
# Experiment: Set up vectors
date <- c(
as.Date("1970-01-01"),
as.Date("2001-09-11")
)
date_time <- c(
as.POSIXct("2018-12-01 21:00", tz = "UTC"),
as.POSIXct("2021-06-01 12:00", tz = "UTC")
)
a <- c(date, date_time)
b <- unlist(list(date, date_time))
# The original vectors are Date and POSIXct types
date[1] "1970-01-01" "2001-09-11"
typeof(date)[1] "double"
date_time[1] "2018-12-01 21:00:00 UTC" "2021-06-01 12:00:00 UTC"
typeof(date_time)[1] "double"
# Profile a
## a is of type Date. Coercion happened to remove time information
a[1] "1970-01-01" "2001-09-11" "2018-12-01" "2021-06-01"
typeof(a)[1] "double"
### a contains the number of days since the epoch
as.integer(a)[1] 0 11576 17866 18779
# Profile b
## b is of type double. All attributes were stripped from both original vectors, leaving only the numeric data behind. Elements represent days since the epoch or seconds since the epoch.
b[1] 0 11576 1543698000 1622548800
typeof(b)[1] "double"
Can you have a data frame with zero rows? What about zero columns?
x <- data.frame()Yes and yes, a dataframe can have no rows and columns. This can also occur via subsetting of dataframes.
mtcars[0, 0]data frame with 0 columns and 0 rows
What happens if you attempt to set rownames that are not unique?
With the case of dataframes, an error is given, since each row is meant to represent an unique sample.
x <- mtcars[c(1:4), ]
rownames(x) <- c("car", "car", "bike", "bicycle")Warning: non-unique value when setting 'row.names': 'car'
Error in `.rowNamesDF<-`(x, value = value): duplicate 'row.names' are not allowed
If df is a data frame, what can you say about t(df), and t(t(df))? Perform some experiments, making sure to try different column types.
First, the dataframe is coerced to a matrix. At this step, all values are coerced to the same type.
getAnywhere(t.data.frame)A single object matching 't.data.frame' was found
It was found in the following places
package:base
registered S3 method for t from namespace base
namespace:base
with value
function (x)
{
x <- as.matrix(x)
NextMethod("t")
}
<bytecode: 0x000000801a111cb0>
<environment: namespace:base>
# Experiment with diamonds
diamonds[1:5, ] %>%
reactable(defaultPageSize = 4)# Experiment with dataframe containing list columns
df_experiment <- tibble(
x = c(1, 2, 3),
y = list(
c(4, 5, 6),
c(7, 8, 9),
c(10, 11, 12)
)
)# Transpose only numeric. Data is of type double, unchanged.
x <- diamonds[1:5, c("depth", "x")] %>%
t()
typeof(x)[1] "double"
x [,1] [,2] [,3] [,4] [,5]
depth 61.50 59.80 56.90 62.4 63.30
x 3.95 3.89 4.05 4.2 4.34
# Transpose again. Data is of type double, unchanged.
x <- x %>%
t()
typeof(x)[1] "double"
x depth x
[1,] 61.5 3.95
[2,] 59.8 3.89
[3,] 56.9 4.05
[4,] 62.4 4.20
[5,] 63.3 4.34
# Transpose mix of numeric and factors. Data is coerced to type character.
x <- diamonds[1:5, c("depth", "color")] %>%
t()
# Transpose again. Data is coerced to type character.
x <- x %>%
t()
typeof(x)[1] "character"
x depth color
[1,] "61.5" "E"
[2,] "59.8" "E"
[3,] "56.9" "E"
[4,] "62.4" "I"
[5,] "63.3" "J"
# Transport df containing list-cols. df is coerced to a matrix (list with dimensions) instead.
x <- df_experiment %>%
t()
typeof(x)[1] "list"
x [,1] [,2] [,3]
x 1 2 3
y numeric,3 numeric,3 numeric,3
What does as.matrix() do when applied to a data frame with columns of different types? How does it differ from data.matrix()?
# as.matrix coerces to most general type (character)
x <- as.matrix(diamonds)
typeof(x)[1] "character"
x %>%
reactable(defaultPageSize = 4)# data.matrix replace factor by their internal codes, then coerce to numeric.
y <- data.matrix(diamonds)
typeof(y)[1] "double"
y %>%
reactable(defaultPageSize = 4)# data.matrix coerces characters to factors, then do the usual replacing.
z <- diamonds %>%
mutate(
cut = cut %>%
as.character(),
color = color %>%
as.character(),
clarity = clarity %>%
as.character(),
)
str(z)tibble [53,940 × 10] (S3: tbl_df/tbl/data.frame)
$ carat : num [1:53940] 0.23 0.21 0.23 0.29 0.31 0.24 0.24 0.26 0.22 0.23 ...
$ cut : chr [1:53940] "Ideal" "Premium" "Good" "Premium" ...
$ color : chr [1:53940] "E" "E" "E" "I" ...
$ clarity: chr [1:53940] "SI2" "SI1" "VS1" "VS2" ...
$ depth : num [1:53940] 61.5 59.8 56.9 62.4 63.3 62.8 62.3 61.9 65.1 59.4 ...
$ table : num [1:53940] 55 61 65 58 58 57 57 55 61 61 ...
$ price : int [1:53940] 326 326 327 334 335 336 336 337 337 338 ...
$ x : num [1:53940] 3.95 3.89 4.05 4.2 4.34 3.94 3.95 4.07 3.87 4 ...
$ y : num [1:53940] 3.98 3.84 4.07 4.23 4.35 3.96 3.98 4.11 3.78 4.05 ...
$ z : num [1:53940] 2.43 2.31 2.31 2.63 2.75 2.48 2.47 2.53 2.49 2.39 ...
z <- data.matrix(z)
typeof(z)[1] "double"
z %>%
reactable(defaultPageSize = 4)In summary, as.matrix() coerces dataframes to characters, data.matrix() coerces dataframes to numeric.